Deepomics, an Information System for Omics Data
The project involved the evolutionary maintenance and semantic enhancement of the Deepomics Information System API, a platform dedicated to Omics data, as well as the development of end-to-end (e2e) tests. Several missions were carried out in collaboration with the PROSE laboratory (PRocédés biOtechnologiques au Service de l'Environnement).
The first mission took place between September 2022 and December 2023, followed by a second mission in July 2024, specifically targeting the development of e2e tests for complete usage scenarios. Currently, evolutionary maintenance is ongoing.
The challenges were multiple: ensuring the necessary maintenance for the platform's proper functioning while working on semantic enhancement, a crucial aspect for guaranteeing the accessibility and use of public scientific resources. The development was supervised by the Information Systems Department (DSI) of INRAE (National Research Institute for Agriculture, Food and Environment), with the UMR (Joint Research Unit) PROSE responsible for scientific and business aspects. A close collaboration was also established with the MISTEA UMR for the semantic enhancement mission.
Tasks & Objectives
As a fullstack developer and semantic expert, I was also responsible for QA engineering for e2e scenarios. My role involved developing and debugging both the frontend and backend of the code repository, while working on the semantic enhancement of the model to enable export compliant with web semantic standards. One of the main objectives was to ensure smooth user journeys, particularly for data deposition and retrieval on the platform.
Success criteria included not only bug fixes and application maintenance but also complete API semantic enhancement aligned with the platform's technologies, particularly API Platform from Symfony. A key objective was to decouple semantic enhancement work from developers, allowing scientists to define relationships through ontologies. Finally, it was essential to develop robust e2e tests.
Actions and Development
My first step was to familiarize myself with the Deepomics environment, including a Symfony-developed backend and an Angular frontend. I then created a specific repository to host the business ontology for semantic enhancement and set up a pipeline to extract content from this repository to automatically generate annotations for API Platform. For e2e tests, I used Robot Framework with the Selenium library.
Regular exchanges with the project, scientific, and IT teams, as well as with the former development team, facilitated my work. Collaboration with the MISTEA UMR was crucial for developing a common ontology, establishing a shared vocabulary. Despite the project's complexity and significant technical debt, implementing API semantic enhancement represented a major challenge but also a learning opportunity.
Key decisions were made collectively during bi-weekly meetings. For API semantic enhancement, I presented a Proof of Concept (POC) before implementing the complete solution.
Results
The results are multiple: correction of numerous bugs, improvement of user feedback and ergonomics, evolution of the model for managing paired fastq files (two files instead of one), and API semantic enhancement using JSON-LD format, compliant with a scientific ontology. A complex CI (continuous integration) was co-built to synchronize the ontology with the API. The repository for this ETL (extract, transform, load) is available here. Additionally, the e2e tests implemented with Robot Framework cover all key user journeys.
I learned to master the Symfony framework and PHP language in depth, to work in a team with a clean CI, and to use Robot Framework with Selenium. Finally, the custom ETL work, transversal between API Platform and web semantic ontology formats (RDF/OWL), strengthened my technical skills.
Technical Stack
The project relies on the following tools and technologies:
- Backend : PHP, Symfony
- Frontend : TypeScript, Angular
- Tests : Robot Framework, Selenium
- Infrastructure : Docker Compose
- Documentation : Markdown
- Custom ETL : Node.js for ETL between ontology files and Symfony
It is important to note that this technical stack is inherited: I did not participate in the initial choices. The major technical challenges encountered include:
- Complexity of the Deepomics information system in terms of business rules
- Need to master both backend (PHP) and frontend (TypeScript)
- Existing technical debt requiring refactoring and code simplification
- Learning Robot Framework and Selenium for managing automated tests